Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

نویسندگان

Johannes Hofmann

Jan Treibig

Georg Hager

Gerhard Wellein

چکیده

We examine the Xeon Phi, which is based on Intel’s Many Integrated Cores architecture, for its suitability to run the FDK algorithm—the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite the lack of a shared last level cache. Apart from parallelization, SIMD vectorization is critical for good performance on the Xeon Phi; we perform various micro-benchmarks to investigate the platform’s new set of vector instructions and put a special emphasis on the newly introduced vector gather capability. We refine a previous performance model for the application and adapt it for the Xeon Phi to validate the performance of our optimized hand-written assembly implementation, as well as the performance of several different auto-vectorization approaches.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi

Parallel algorithms are popular method of increasing system performance. Apart from showing their properties using asymptotic analysis, proof-of-concept implementation and practical experiments are often required. In order to speed up the development and provide simple and easily accessible testing environment that enables execution of reliable experiments, the paper proposes a platform with mu...

متن کامل

Analysis of the Execution - Time Variation of OpenMP - based Applications on the Intel R © Xeon Phi TM

The Intel © Xeon Phi accelerator is currently being used in several large-scale computer clusters and supercomputers to enhance the execution-time performance of computation-intensive applications. While performing a comprehensive profiling of the Intel © Xeon Phi execution-time behavior of different applications included in the Rodinia Benchmark suite, we observed large variations in applicati...

متن کامل

Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors

Many-Task Computing (MTC) aims to bridge the gap between HPC and HTC. MTC emphasizes running many computational tasks over a short period of time, where tasks can be either dependent or independent of one another. MTC has been well supported on Clouds, Grids, and Supercomputers on traditional computing architectures, but the abundance of hybrid large-scale systems using accelerators has motivat...

متن کامل

A Performance and Scalability Analysis of the Tsunami Simulation EasyWave for Different Multi-Core Architectures and Programming Models

In this paper, the performance and scalability of different multi-core systems is experimentally evaluated for the Tsunami simulation EasyWave. The target platforms include a standard Ivy Bridge Xeon processor, an Intel Xeon Phi accelerator card, and also a GPU. OpenMP, MPI and CUDA were used to parallelize the program to these platforms. The absolute performance of the application on the diffe...

متن کامل

Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned

In this paper we report our experiences in porting the FEASTFLOW software infrastructure to the Intel Xeon Phi coprocessor. Our efforts involved both the evaluation of programming models including OpenCL, POSIX threads and OpenMP and typical optimization strategies like parallelization and vectorization. Since the straightforward porting process of the already existing OpenCL version of the cod...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

نویسندگان

چکیده

منابع مشابه

Modern Platform for Parallel Algorithms Testing: Java on Intel Xeon Phi

Analysis of the Execution - Time Variation of OpenMP - based Applications on the Intel R © Xeon Phi TM

Understanding the Costs of Many-Task Computing Workloads on Intel Xeon Phi Coprocessors

A Performance and Scalability Analysis of the Tsunami Simulation EasyWave for Different Multi-Core Architectures and Programming Models

Porting FEASTFLOW to the Intel Xeon Phi: Lessons Learned

عنوان ژورنال:

اشتراک گذاری